Adaptive video delivery

ABSTRACT

A video server for transmitting video across a network to a plurality of users includes a coder ( 112 ) which operates in response to call requests or instructions received across the network ( 102 ) from individual clients ( 104, 106 ). The server maintains in memory a series of user status records ( 116 ), one for each of the users. Each record includes, or refers to, a copy of a previous frame sent to the respective user. When the coder ( 112 ) receives a call request, it uses the previously-sent frame for that user to optimise coding the current frame that is to be sent, received from a frame grabber ( 110 ). The invention avoids the difficulties of video broadcasting while still requiring only a single instance of a video coder.

[0001] The present invention relates to adaptive video delivery, and in particular to an apparatus, system and method for supplying video images to multiple users across a network.

[0002] The satisfactory delivery of video to multiple users across a network which has no guaranteed level of service has historically been difficult to achieve. Broadcasting the video across the network is typically unsatisfactory since not all recipients will necessarily be able to handle the data at the same bit rates. Even where they could theoretically do so, different levels of congestion at different points across the network may mean that in practice the bit rate that can actually be delivered varies between recipients. Furthermore, the actual bit rate available to each recipient will normally vary with time, as the level of congestion across the network varies. The result is that users see unacceptable “gaps” or “jumps” in the video transmission.

[0003] Conventional attempts to address this problem (as used for example in the transmission of web-cam video across the internet) rely on the individual transmission to each user of a sequence of still images at a rate the user can accept. When the recipient has received the first image, a message is sent back across the internet to the web-cam server, requesting the next, and so on.

[0004] Such a method is inefficient in its use of network band width, and is incapable of making use of time-dependent video compression techniques such as are used for example in motion-compensated video compression.

[0005] It is an object of the present invention to provide an improved method and apparatus for the delivery of video to multiple users across a network.

[0006] It is a further objective of the invention at least to alleviate the problems mentioned above.

[0007] According to a first aspect of the present invention there is provided a video server for transmitting video across a network to a plurality of users, comprising:

[0008] (a) a video coder receiving as input a sequence of video frames, and being responsive to a call request from a user to select an image frame from the sequence, encode it and transmit it to the user;

[0009] (b) a user status record memory holding a user status record for each respective user, each record including or being indicative of a previous frame transmitted by the coder to the respective user;

[0010] the video coder being arranged to encode the said image frame utilizing information on the previous frame for the user originating the call request.

[0011] Preferably, the said image frame consists of or includes the current frame in the sequence. Unused frames, which may have gone by, are omitted and need not be coded. The coder need not deal with only one frame at once: in one embodiment, several frames are dealt with at the same time in response to a single call request.

[0012] Preferably, the user status record for each respective user includes details of the recent history of the frames which have been transmitted to the respective user. In appropriate cases, that might include for example a copy of the most recent intra-frame, details of the most recent residual frame, details of the most recent motion data, timing information such as the time of receipt of the most recent call request and the time that the request was satisfied, the expected time of receipt of the next call request, and so on. More generally, the user status record maintains whatever historical information is necessary or convenient in order for the encoder to be able to encode the next required frame efficiently. Other administrative details of the user could also be stored, for example user preferences, thereby allowing the encoder to adjust its operation, as required, to the preferences or needs of the individual users.

[0013] The previous frame (which may but need not be the most recently sent frame for that user) may be stored as part of the user status record or, alternatively, the record may include a pointer to a frame which is stored in a frame store. To reduce memory requirements, single frames within the frame store may, when necessary, have multiple pointers pointing to them.

[0014] In the preferred embodiment, the encoder includes motion estimation and motion compensation means. In such a case, the output stream for each user may include coded intra-frame data, residual data and motion data.

[0015] The coder may be arranged to send an intra-frame to each respective user after a given period, whether or not a respective call request has been received. Typically, the period will be determined individually for each user, with the periods running concurrently.

[0016] Means are preferably provided for identifying, consolidating or reconciling similar or identical user status records. The coder may then be instructed to encode once only an image for transmission to all those users corresponding to the said similar or identical records.

[0017] More generally, the server may be arranged to respond to the needs of each of its clients.

[0018] According to a further aspect of the present invention there is provided a video server for transmitting video across a network to a plurality of users, comprising:

[0019] (a) a video coder receiving as input a sequence of video frames, and being responsive to control instructions from a user to select image frames from the sequence, encode them and transmit them to the user;

[0020] (b) a user status record memory holding a user status record for each respective user, each record including information indicative of previous control instructions received by the server from the respective user;

[0021] the video coder being arranged to encode frames for the respective user utilizing the information in the user status record for that user.

[0022] The control instructions may be more sophisticated than simply asking for one frame, as many parameters that are adjustable in the coder could be directed from the client. They may thus include or comprise server and/or coder configuration instructions.

[0023] According to a further aspect of the present invention there is provided a method of transmitting video across a network to a plurality of users, comprising:

[0024] (a) maintaining a user status record for each respective user, each record including information indicative of previous control instructions received from a respective user; and

[0025] (b) encoding and transmitting frames to the respective user utilizing the information in the user status record for that user.

[0026] According to a further aspect of the invention there is provided a video delivery system comprising a video server as defined above, and a plurality of client units for receiving video transmissions from the server. The client units could be fixed or portable computers, mobile phones or the like. Each will typically have either a screen for displaying the received video, or some means of storing it.

[0027] A client unit is preferably arranged to send a call request to the video server, asking for the next frame, on completion of receipt of the previous frame. Alternatively, a client unit may issue a pre-emptive call request, for the next frame, during receipt of the previous video frame.

[0028] Means may be provided, at a node of a network between the server and at least one of the client units, for generating a call request. This allows a call request to be sent back to the server, and the server to get on with processing the next frame, while the previous frame is still completing its transmission across the final part of the network.

[0029] According to a further aspect of the present invention there is provided a method of transmitting video across a network to a plurality of users, comprising:

[0030] (a) maintaining a user status record for each respective user, each record including or being indicative of a previous frame transmitted to the respective user; and

[0031] (b) on receipt of a call request for a user, selecting a current image from a sequence of video frames, encoding it and transmitting it to the user, the encoding utilizing information on the previous frame for the user originating the call request.

[0032] According to a further aspect of the invention there is provided a method of transmitting video across a network to a plurality of users, including transmitting a control instruction from a user to a server, the control instruction being indicative of the user status; and, on receipt of the instruction by the server, selecting a current image from a sequence of video frames, encoding it and transmitting it to the said user, the encoding utilizing information contained within the control instruction.

[0033] The information may be indicative of a frame previously sent by the server to the said user, and may include a frame number, eg of the most recently-transmitted intra frame. The information may include or consists of information which has previously been sent from the server to the said user.

[0034] The invention in its various aspects provides adaptive video delivery, in an efficient way, even over communications networks having no guaranteed level of service. It provides a middle way between, on the one hand, simply broadcasting a signal to all users and, on the other, dealing with each user entirely separately and providing an individual encoder for each.

[0035] More generally, the present invention extends to methods of transmitting video using any of the apparatus features mentioned above.

[0036] The invention may be carried into practice in a number of ways and one specific embodiment will now be described, by way of example, with reference to the accompanying drawings, in which:

[0037]FIG. 1 shows a system for video delivery in accordance with the preferred embodiment of the present invention;

[0038]FIG. 2 shows a variation utilising intermediate nodes of the network;

[0039]FIG. 3 shows schematically a preferred coder for use with the present invention; and

[0040]FIG. 4 shows a preferred decoder for use with the present invention.

[0041]FIG. 1 shows schematically a video delivery system according to the preferred embodiment of the present invention. The system includes a server generally indicated at 100 which is arranged to supply video to a plurality of clients or users 104, 106 across a network 102.

[0042] The network 102 may be any type of network, such as a wired or wireless network. Examples include mobile phone networks, cable, PSTN, satellite, or indeed a heterogeneous network made up of components of different types possibly having different performances (such as the internet).

[0043] Typically (although not necessarily) the network will guarantee message delivery, but will not guarantee delivery time. Also, the bit rate that can be delivered across the network will not normally be guaranteed. The network may use any convenient networking protocol such as for example TCP/IP.

[0044] The clients 104, 106 may comprise fixed or mobile computers, mobile telephones or the like on which the received video is to be displayed or stored. Conventional elements such as processors, display, memory, input devices and so on are not shown in FIG. 1, for simplicity.

[0045] The server 100 includes a video camera 108, a frame grabber 110 and a coder 112 for coding the frames for transmission across the network. Alternatively, the camera and frame grabber may be replaced with some other source (not shown) of digitised video images such as an image store, or video data received in real time from elsewhere.

[0046] The coder 112 encodes the current frame, as supplied by the frame grabber 110, and sends appropriately addressed data across the network to the clients 104, 106 that the coder is aware of. A copy of the uncoded version of the image that has been sent is stored in a frame store 114. A memory 116 holds user status records, and these are updated to show that both of the clients have, in this instance, been sent a copy of the same frame.

[0047] Each client includes a respective decoder 118, 122 along with a frame detect section 120, 124 which detects when the frame has been completely received by the client. The decoder and the frame detect section are shown separately in FIG. 1, for simplicity, but in practice they may of course be combined.

[0048] When the frame detect section determines that the frame has been fully received, it issues a call request, back across the network, which is ultimately received and acted upon by the coder 112 within the server. The call request identifies the client and instructs the coder 112 to start sending the next frame.

[0049] On receipt of such a request, the coder 112 reads in the current frame in the video sequence from the frame grabber 110 and encodes the next frame for transmission to that client based not only on that frame but also upon the previous frame for that client, as stored in the frame store 114 and accessed by the information in the user status records 116. Typically, the coder may use motion estimation techniques to send motion-vector information rather than compressed still frames.

[0050] Alternatively, the image stored in the frame store, instead of being the uncoded version, might be a decoded version. Which image is to be stored will depend on whether the strategy of the encoder/compressor is to predict from uncompressed or from compressed frames (both are possible).

[0051] Each of the clients being served by the server 100 may be capable of receiving data at different rates, and those rates may vary with time as a function amongst other things of congestion on the network 102. That is of no great consequence, however, since each client is being sent motion-compensated video at an appropriate rate.

[0052] In the simplest arrangement, the frame store 114 stores one frame for each client that is being served; typically, that will be the most recent intra-frame for each client. The user status records may include details of the subsequent history for each client, for example details of the subsequently-sent residual data and motion data. In order to save storage space, an intra-frame is not stored separately for each user record; rather, a pointer 126 points to the appropriate image within the frame store 114.

[0053] In a more sophisticated arrangement, the server may attempt to consolidate or to reconcile user status records that are similar or identical. In such a case, the coder can prepare a single updated image for transmission to the clients in question, without having to code each client updated image separately. The resultant updated image is then sent twice, once addressed to the first client and once addressed to the second. Optionally, the server need not wait for both of the client requests to be received in such a case, and it may simply send both addressed images on receipt of the first request, without awaiting the second.

[0054] A variation of the preferred embodiment, providing greater efficiency, is shown in FIG. 2. Here, a server 210 serves three clients 212, 214, 216 across the network that includes two nodes 218, 220. It will be assumed for the purposes of this illustration that the network path/channel 222 is capable only of carrying low bit rates, or is unreliable, while network paths/channels 226, 228 are capable of handling higher bit rates with high reliability.

[0055] In order to avoid having to duplicate transmissions across the path 222 to the clients 212, 214, the node N1 218 includes within it a decoder and frame detect section, similar to those shown in FIG. 1. The server directly addresses the node, and the node issues the call requests as appropriate. As the data is received, it is passed on along the paths 226, 228 to the clients 212, 214.

[0056] Similarly, the node N2 220 includes a decoder and frame detect section, allowing the node to control the call requests necessary to serve the client 216. Depending upon the bit rates available along the lines 224 and 230, the node N2 220 may issue a call request back to the server even before all of the data has been completely transmitted along the path 230 to the client 216.

[0057] In both the FIG. 1 and the FIG. 2 variations, the client or the node may when appropriate issue a pre-emptive call request—in other words it may issue the request before the previous frame has been fully received. That allows decoding of one frame to continue in parallel with the encoding and transmission of the next one by the server.

[0058] In the preferred embodiment, the coder makes use of motion-compensated compression techniques. The output stream for each of the clients thus consists of coded intra-frame data, residual data and motion data. Timing information may be recorded in the user status records 116 forcing the coder to send a new intra-frame to each of the clients after a certain lapsed interval (the length of which may vary between clients) whether or not a call request has been received. In situations where the data transfer rate is relatively constant, the server may be programmed preemptively to send the next frame to the client if the expected call request has not been received after the expected period.

[0059] The system may incorporate the ability for the client to be able to instruct the server to send a desired number of bits per pixel (producing a variable frame rate). Alternatively, the user may issue the call request at regular intervals, thereby forcing the server to work at a fixed frame rate for that user. Individual users may have differing requirements.

[0060] Note that the client is able to assess both the latency and the bit rate of the information received from the server, and could base its demands on the past and/or current performance of its channel. For example, if the latency experienced in responding to client requests is long the client might go further than preempting just one frame. It could run several frames ahead, or request a group of frames at a particular frame rate. The latency—the delay between a request being sent and data beginning to arrive—is of course ideally zero, but in practice (particularly across a WAN) it may be substantial. Thus, client may anticipate and ask for bursts of frames at a rate adapted to the current bit rate of the connection.

[0061] The call request may be more sophisticated than simply asking for one frame, as many (or all) parameters that are adjustable in the coder could be directed from the client.

[0062] The client can analyse the data coming from the server, rather than burden the server with the task of ‘sounding the channels’.

[0063] More generally, the individual clients may control the server each with its own series of control instructions.

[0064] In another embodiment (not shown) the status records may be stored wholly or partly in the clients, instead of at the server. The server in such a case sends, along with each frame, information on the user status—eg something which at least partially identifies the history of the frame being sent. That information may then be ‘forgotten’ by the server. When requesting a next frame, the user sends back the information previously received (or at least sends some control signal which is derived from that information), thereby allowing the server to construct the next frame on the basis of the previous one.

[0065] The coder may comprise a motion-compensated coder which just requires to know the frame number of the intra-frame most recently used for motion compensation. With such a coder, the information to be transmitted may simply be the frame number.

[0066] As previously mentioned, the preferred system makes use of motion-compensated video compression. Further details of the preferred coder 112 will now be described with reference to FIG. 3. The coder may be implemented either in hardware or in software.

[0067] Frame by frame input is applied at an input 302, with the intra-frame data being passed to an intra-frame coder 304 and the inter-frame data being passed to a motion estimator 306. The motion estimator provides the parametised motion description on line 308 which is passed to a motion compensator 310. The motion compensator outputs a predicted frame along a line 312 which is subtracted from the input frame to provide a residual frame 314 which is passed to a residual coder 316. This codes the residual frame and outputs the residual data on 318 to the output stream.

[0068] The motion description on line 308 is passed to a motion description coder 320, which codes the description and outputs motion data on a line 322.

[0069] The output stream consists of coded intra-frame data, residual data and motion data.

[0070] The output stream is fed back to a reference decoder 324 which itself feeds back a reference frame (intra or inter) along lines 326, 328 to the motion compensator and the motion estimator. In that way, the motion compensator and the motion estimator are always aware of exactly what has just been sent in the output stream. The reference decoder 324 may itself be a full decoder, for example as illustrated in FIG. 4.

[0071] The output stream travels across the communications network and, at the other end, is decoded by the decoder 118, 112 which is shown schematically in FIG. 4. The intra-information in the data stream is supplied to an intra-frame decoder 410, which provides decoded intra-frame information on a line 412. The inter information is supplied to a bus 414. From that bus, the residual data is transmitted along a line 416 to a residual decoder 418. Simultaneously, the motion data is supplied along a line 420 to a motion compensator 422. The outputs from the residual decoder and the motion compensator are added together to provide a decoded intra-frame on line 424.

[0072] Reference frame information is fed back along a line 424 to the motion compensator, so that the motion compensator always has current details of both the output from and the input to the decoder. 

1. A video server for transmitting video across a network to a plurality of users, comprising: (a) a video coder receiving as input a sequence of video frames, and being responsive to a call request from a user to select an image frame or group of frames from the sequence, encode it or them and transmit it or them to the user; (b) a user status record memory holding a user status record for each respective user, each record including or being indicative of a previous frame or group of frames transmitted by the coder to the respective user, the video coder being arranged to encode the said image frame or group of frames utilizing information on the previous frame or group of frames for the user originating the call request.
 2. A video server as claimed in claim 1 in which each user status record includes details of the recent history of frames which have been transmitted to the user.
 3. A video server as claimed in claim 1 in which each user status record includes a pointer to a previous frame, stored in a frame store.
 4. A video server as claimed in claim 3 in which a single frame in the frame store may have multiple pointers pointing to it.
 5. A video server as claimed in any one of the preceding claims in which the coder includes motion estimation and motion compensation means.
 6. A video server as claimed in any one of the preceding claims in which the coder generates an output stream, for each user, including coded intra-frame data, residual data and motion data.
 7. A video server as claimed in claim 6 in which the user status record includes information on the most recently transmitted intra-frame for the respective user.
 8. A video server as claimed in claim 7 in which each user status record includes details of the most recent residual frame.
 9. A video server as claimed in claim 7 in which each user status record includes details of the most recent motion data.
 10. A video server as claimed in claim 7 in which each user status record includes timing information.
 11. A video server as claimed in claim 7 in which the coder is arranged to send an intra-frame to each respective user after a given period, whether or not a respective call request has been received.
 12. A video server as claimed in any one of the preceding claims including means for identifying similar or identical user status records, and for instructing the coder to encode once only an image for transmission to all those users corresponding to the said similar or identical records. 13 A video delivery device system comprising a video server as claimed in any one of the previous claims, and a plurality of client units for receiving video transmissions from the server.
 14. A video delivery device system as claimed in claim 13 in which the client units are mobile phones.
 15. A video delivery device system as claimed in claim 13 in which the video server is arranged to transmit video to the client units across the internet.
 16. A video delivery device system as claimed in any one of claims 13 to 15 in which the client units are arranged to send a call request to the video server on completion of receipt of a video frame.
 17. A video delivery device system as claimed in any one of claims 13 to 15 in which the client units are arranged to send a pre-emptive call request to the video server during receipt of a video frame.
 18. A video delivery device system as claimed in any one of claims 13 to 15 including means, at a node of a network between the server and at least one of the client units, for generating a call request.
 19. A method of transmitting video across a network to a plurality of users, comprising: (a) maintaining a user status record for each respective user, each record including or being indicative of a previous frame or group of frames, transmitted to the respective user; and (b) on receipt of a call request for a user, selecting a frame or group of frames from a sequence of video frames, encoding it or them and transmitting it or them to the user, the encoding utilizing information on the previous frame or group of frames, for the user originating the call request.
 20. A method as claimed in claim 19 in which each user status record includes details of the recent history of frames which have been transmitted to the user.
 21. A method of transmitting video as claimed in claim 19 including maintaining a frame store of previous frames, each user status record including a pointer to a previous frame within the frame store.
 22. A method of transmitting video as claimed in claim 21 in which a single frame in the fame store may have multiple pointers pointing to it.
 23. A method of transmitting video as claimed in any one of claims 19 to 22 including encoding a current image using motion estimation and motion compensation.
 24. A method of transmitting video as claimed in any one of claims 19 to 23 including generating an output stream, for each user, which includes coded intra-frame data, residual data and motion data.
 25. A method of transmitting video as claimed in claim 24 including maintaining in the user status record information on the most recently transmitted intra-frame for the respective user.
 26. A method as claimed in claim 19 in which each user status record includes details of the most recent residual frame.
 27. A method as claimed in claim 19 in which each user status record includes details of the most recent motion data.
 28. A method as claimed in claim 19 in which each user status record includes timing information.
 29. A method of transmitting video as claimed in claim 25 including sending an intra-frame to each respective user after a given period, whether or not a respective call request has been received.
 30. A method of transmitting video as claimed in any one of claims 19 to 29 including identifying similar or identical user status records, and encoding once only an image for transmission to all those users corresponding to the said similar or identical records.
 31. A video delivery device system as claimed in claim 13, in which the client units are arranged to send a pre-emptive call request to the video server to request more than one frame in advance.
 32. A method of transmitting video across a network to a plurality of users, comprising: (a) maintaining a user status record for each respective user, each record including or being indicative of a previous frame or group of frames sent to the said user; and (b) encoding and transmitting further frames to the said user utilizing the information in the user status record for the said user. 